class Player:
"""A golfer with an ID, name, and handicap."""
def __init__(self, player_id, name, handicap):
self.player_id = player_id
self.name = name
self.handicap = handicap
def __repr__(self):
"""Developer-friendly representation (shows up in the REPL and debugger)."""
return f'Player(player_id={self.player_id}, name={self.name!r}, handicap={self.handicap})'
def __str__(self):
"""User-friendly representation (shows up when you print)."""
return f'{self.name} (handicap {self.handicap})'
# Create an instance
bear = Player(player_id=1, name='Bear Woods', handicap=2.1)
print('repr:', repr(bear))
print('str: ', str(bear))
print()
print('Accessing attributes:')
print(f' Name: {bear.name}')
print(f' Handicap: {bear.handicap}')
print(f' Type of handicap: {type(bear.handicap)}')Classes and Data Modeling
What You’ll Learn
- Why raw dictionaries-of-strings from CSV files become a problem as your code grows
- How to use Python classes to model real-world things like players, courses, rounds, and shots
- The difference between regular classes and dataclasses, and when to use each
- How to load CSV data into typed, well-structured objects
- How to add behavior (methods) to your data models
- How inheritance works and when it is (and is not) worth using
- How to model nested JSON data from a real iPhone shot-tracking app
Concept
What Is a Data Model?
A data model is the way you represent real-world things in code. Every time you read a CSV file with csv.DictReader, you get back a list of dictionaries where every value is a string:
{'player_id': '1', 'name': 'Bear Woods', 'handicap': '2.1'}This works for quick scripts, but it has real problems:
No type safety.
player['handicap']is the string'2.1', not a float. You have to remember to convert it every time you use it, and if you forget, Python will happily compare'2.1' < '25.3'using string ordering (which gives the wrong answer).No discoverability. If you hand someone a dictionary, they have to guess what keys it contains. There is no autocomplete, no documentation, no way to know whether the key is
'handicap'or'hcp'or'handicap_index'without reading the CSV header.No behavior. A dictionary cannot calculate a handicap differential or format a scorecard. You end up writing standalone functions that take dictionaries as arguments, and the connection between the data and the operations on that data is purely in your head.
No validation. Nothing prevents you from creating a player dict with a negative handicap or a missing name. Bugs show up far from where they were introduced.
A proper data model solves all of these problems by giving your data structure, types, and behavior.
Mapping Real-World Things to Code
Think about what we are modeling in our golf dataset:
- A Player has an ID, a name, and a handicap.
- A Course has an ID, a name, a city, a state, a slope rating, and a course rating.
- A Hole belongs to a course and has a hole number, par, yardage, and handicap index.
- A Round records that a specific player played a specific course on a specific date, with a total score and weather conditions.
- A Shot records one swing within a round: the hole, shot number, club used, where the ball started, where it ended, and the strokes gained value.
Each of these is a noun in the golf domain, and each has specific attributes (the data it carries) and behavior (things you can compute from it). In Python, we model nouns as classes and behavior as methods.
Classes vs Dataclasses
Python gives you two main ways to define a class:
Regular classes require you to write __init__, __repr__, __eq__, and other boilerplate methods by hand. This gives you full control but involves a lot of repetitive code.
Dataclasses (introduced in Python 3.7) automatically generate __init__, __repr__, __eq__, and more based on the fields you declare. You just list the attributes and their types, and Python does the rest.
| Feature | Regular Class | Dataclass |
|---|---|---|
__init__ generated? |
No, you write it | Yes, automatic |
__repr__ generated? |
No, you write it | Yes, automatic |
__eq__ generated? |
No, you write it | Yes, automatic |
| Type annotations? | Optional | Required (used to define fields) |
| Best for | Complex behavior, custom initialization | Data containers, records, domain objects |
Rule of thumb for data science: Start with dataclasses. They cover 90% of what you need. Switch to a regular class only if you need custom __init__ logic or complex internal state management.
The Domain Model
When you define a set of classes that represent the key concepts in your problem area, you have created a domain model. For our golf data, the domain model is the collection of Player, Course, Hole, Round, and Shot classes, along with the relationships between them (a Round references a Player and a Course; a Shot belongs to a Round).
A good domain model makes your code read like a description of the problem. Instead of:
score_diff = int(row['total_score']) - float(course_dict['course_rating'])you can write:
score_diff = round.total_score - course.course_ratingSame logic, but the second version is self-documenting.
Code
1. A Basic Class: Player
Let’s start by building a Player class the traditional way, with a manual __init__ method. We will also add __repr__ (for developer-friendly output) and __str__ (for user-friendly output).
Compare this to a raw dictionary from csv.DictReader:
# This is what csv.DictReader gives us
bear_dict = {'player_id': '1', 'name': 'Bear Woods', 'handicap': '2.1'}
print('Dictionary handicap:', bear_dict['handicap'], type(bear_dict['handicap']))
print('Class handicap: ', bear.handicap, type(bear.handicap))
print()
# The string comparison trap
print('String comparison: "2.1" < "25.3" =>', '2.1' < '25.3')
print('Float comparison: 2.1 < 25.3 =>', 2.1 < 25.3)2. Instance Methods: Adding Behavior
Classes become powerful when you attach methods – functions that operate on the instance’s data. Let’s add two methods to Player:
format_handicap()returns a nicely formatted string like"+2.1"or"+25.3"handicap_differential(score, course_rating, slope_rating)computes the USGA handicap differential formula:
\[\text{differential} = \frac{(\text{score} - \text{course rating}) \times 113}{\text{slope rating}}\]
The number 113 is the standard slope rating (the slope of a course of average difficulty).
class Player:
"""A golfer with an ID, name, and handicap."""
def __init__(self, player_id, name, handicap):
self.player_id = player_id
self.name = name
self.handicap = handicap
def __repr__(self):
return f'Player(player_id={self.player_id}, name={self.name!r}, handicap={self.handicap})'
def __str__(self):
return f'{self.name} ({self.format_handicap()})'
def format_handicap(self):
"""Return the handicap as a formatted string like '+2.1'."""
return f'+{self.handicap:.1f}'
def handicap_differential(self, score, course_rating, slope_rating):
"""Calculate the USGA handicap differential for a single round.
Formula: (score - course_rating) * 113 / slope_rating
"""
return round((score - course_rating) * 113 / slope_rating, 1)
bear = Player(1, 'Bear Woods', 2.1)
print(bear)
print()
# Bear shot 85 at North Park (course rating 71.1, slope 117)
diff = bear.handicap_differential(score=85, course_rating=71.1, slope_rating=117)
print(f'Bear shot 85 at North Park:')
print(f' Handicap differential: {diff}')3. Dataclasses: Eliminating Boilerplate
Look at how much code we wrote just for the __init__ and __repr__ methods above. For a data container like Player, this is pure boilerplate. The dataclasses module generates it for us.
The @dataclass decorator reads the class’s type-annotated fields and automatically generates __init__, __repr__, and __eq__.
from dataclasses import dataclass
@dataclass
class Player:
"""A golfer with an ID, name, and handicap."""
player_id: int
name: str
handicap: float
def format_handicap(self):
"""Return the handicap as a formatted string like '+2.1'."""
return f'+{self.handicap:.1f}'
def handicap_differential(self, score, course_rating, slope_rating):
"""Calculate the USGA handicap differential for a single round."""
return round((score - course_rating) * 113 / slope_rating, 1)
bear = Player(player_id=1, name='Bear Woods', handicap=2.1)
# __repr__ is generated automatically
print(repr(bear))
# __eq__ is generated automatically -- compares all fields
bear2 = Player(player_id=1, name='Bear Woods', handicap=2.1)
print(f'bear == bear2: {bear == bear2}')
# Our custom methods still work
print(f'Formatted handicap: {bear.format_handicap()}')Dataclass features: defaults and frozen
Dataclass fields can have default values. Fields with defaults must come after fields without defaults (just like function arguments).
The frozen=True option makes instances immutable – you cannot change their attributes after creation. This is useful for data that should not be accidentally modified.
@dataclass(frozen=True)
class CourseInfo:
"""Immutable course data -- cannot be modified after creation."""
name: str
slope_rating: int
course_rating: float
city: str = 'Pittsburgh'
state: str = 'PA'
north_park = CourseInfo(
name='North Park Golf Course',
slope_rating=117,
course_rating=71.1
)
print(north_park)
print(f'City defaults to: {north_park.city}')
print()
# Try to modify a frozen dataclass
try:
north_park.slope_rating = 999
except AttributeError as e:
print(f'Cannot modify frozen dataclass: {e}')4. Building the Golf Domain Model
Now let’s define dataclasses for every entity in our golf dataset. Each class mirrors one CSV file. We will keep them simple – just data containers with proper types.
from dataclasses import dataclass
@dataclass
class Player:
"""A golfer."""
player_id: int
name: str
handicap: float
def format_handicap(self):
return f'+{self.handicap:.1f}'
def handicap_differential(self, score, course_rating, slope_rating):
"""USGA handicap differential: (score - CR) * 113 / slope."""
return round((score - course_rating) * 113 / slope_rating, 1)
@dataclass
class Course:
"""A golf course."""
course_id: int
name: str
city: str
state: str
slope_rating: int
course_rating: float
@dataclass
class Hole:
"""A single hole on a course."""
course_id: int
hole_number: int
par: int
yardage: int
handicap_index: int
@dataclass
class Round:
"""A recorded round of golf."""
round_id: int
player_id: int
course_id: int
date: str
total_score: int
weather: str
@dataclass
class Shot:
"""A single shot within a round."""
round_id: int
hole: int
shot_number: int
club: str
start_lie: str
start_distance_to_pin: float
end_lie: str
end_distance_to_pin: float
strokes_gained: float
print('Domain model defined: Player, Course, Hole, Round, Shot')Notice how each field has a clear type. When we load data from CSV, we will convert strings to the proper types at load time, and from that point on everything is typed correctly.
5. Loading CSV Data into Dataclasses
The bridge between raw CSV data and our domain model is a loading function (or a @classmethod factory). The pattern is:
- Read the CSV with
csv.DictReader(which gives dictionaries of strings). - For each row, convert strings to proper types and create a dataclass instance.
- Return a list of typed objects.
We will use @classmethod factory methods so each class knows how to construct itself from a CSV row.
import csv
from dataclasses import dataclass
from pathlib import Path
DATA_DIR = Path('../../data')
@dataclass
class Player:
player_id: int
name: str
handicap: float
@classmethod
def from_csv_row(cls, row):
"""Create a Player from a csv.DictReader row."""
return cls(
player_id=int(row['player_id']),
name=row['name'],
handicap=float(row['handicap']),
)
def format_handicap(self):
return f'+{self.handicap:.1f}'
def handicap_differential(self, score, course_rating, slope_rating):
return round((score - course_rating) * 113 / slope_rating, 1)
@dataclass
class Course:
course_id: int
name: str
city: str
state: str
slope_rating: int
course_rating: float
@classmethod
def from_csv_row(cls, row):
return cls(
course_id=int(row['course_id']),
name=row['name'],
city=row['city'],
state=row['state'],
slope_rating=int(row['slope_rating']),
course_rating=float(row['course_rating']),
)
@dataclass
class Hole:
course_id: int
hole_number: int
par: int
yardage: int
handicap_index: int
@classmethod
def from_csv_row(cls, row):
return cls(
course_id=int(row['course_id']),
hole_number=int(row['hole_number']),
par=int(row['par']),
yardage=int(row['yardage']),
handicap_index=int(row['handicap_index']),
)
@dataclass
class Round:
round_id: int
player_id: int
course_id: int
date: str
total_score: int
weather: str
@classmethod
def from_csv_row(cls, row):
return cls(
round_id=int(row['round_id']),
player_id=int(row['player_id']),
course_id=int(row['course_id']),
date=row['date'],
total_score=int(row['total_score']),
weather=row['weather'],
)
@dataclass
class Shot:
round_id: int
hole: int
shot_number: int
club: str
start_lie: str
start_distance_to_pin: float
end_lie: str
end_distance_to_pin: float
strokes_gained: float
@classmethod
def from_csv_row(cls, row):
return cls(
round_id=int(row['round_id']),
hole=int(row['hole']),
shot_number=int(row['shot_number']),
club=row['club'],
start_lie=row['start_lie'],
start_distance_to_pin=float(row['start_distance_to_pin']),
end_lie=row['end_lie'],
end_distance_to_pin=float(row['end_distance_to_pin']),
strokes_gained=float(row['strokes_gained']),
)
print('All dataclasses defined with from_csv_row() factory methods.')Now let’s write a generic loader function and use it to load all our data.
def load_csv(filepath, cls):
"""Read a CSV file and return a list of dataclass instances.
Args:
filepath: Path to the CSV file.
cls: A dataclass with a from_csv_row() classmethod.
Returns:
A list of instances of cls.
"""
with open(filepath, 'r') as f:
reader = csv.DictReader(f)
return [cls.from_csv_row(row) for row in reader]
# Load all the data
players = load_csv(DATA_DIR / 'players.csv', Player)
courses = load_csv(DATA_DIR / 'courses.csv', Course)
holes = load_csv(DATA_DIR / 'holes.csv', Hole)
rounds = load_csv(DATA_DIR / 'rounds.csv', Round)
shots = load_csv(DATA_DIR / 'shots.csv', Shot)
print(f'Players: {len(players)}')
print(f'Courses: {len(courses)}')
print(f'Holes: {len(holes)}')
print(f'Rounds: {len(rounds)}')
print(f'Shots: {len(shots)}')Now let’s see the benefit. Every attribute is the correct type, and the objects print clearly.
# Inspect the loaded objects
for p in players:
print(p)
print()
# Types are correct
bear = players[0]
print(f'{bear.name} handicap: {bear.handicap} (type: {type(bear.handicap).__name__})')
print()
# We can sort players by handicap -- no string conversion needed
sorted_players = sorted(players, key=lambda p: p.handicap)
print('Players sorted by handicap (best to worst):')
for p in sorted_players:
print(f' {p.name:20s} {p.format_handicap()}')# Inspect courses
for c in courses:
print(f'{c.name:30s} ({c.city}, {c.state}) Slope: {c.slope_rating} CR: {c.course_rating}')# Build lookup dictionaries for quick access by ID
player_lookup = {p.player_id: p for p in players}
course_lookup = {c.course_id: c for c in courses}
# Now we can resolve a round to actual objects
r = rounds[0]
player = player_lookup[r.player_id]
course = course_lookup[r.course_id]
print(f'Round {r.round_id}: {player.name} played {course.name} on {r.date}')
print(f' Score: {r.total_score} | Weather: {r.weather}')
print(f' Relative to course rating: {r.total_score - course.course_rating:+.1f}')6. Adding Behavior: Methods on Domain Objects
Now that we have typed objects and lookup dictionaries, we can add methods that compute useful golf statistics. Let’s add behavior to our Course class and create a RoundAnalyzer helper class.
Course total par from its holes
A course’s total par is the sum of the par values for all 18 (or 9) holes. Let’s compute this from the holes data.
# Group holes by course
holes_by_course = {}
for h in holes:
if h.course_id not in holes_by_course:
holes_by_course[h.course_id] = []
holes_by_course[h.course_id].append(h)
def course_total_par(course_id):
"""Calculate the total par for a course from its holes."""
return sum(h.par for h in holes_by_course[course_id])
def course_total_yardage(course_id):
"""Calculate the total yardage for a course from its holes."""
return sum(h.yardage for h in holes_by_course[course_id])
for c in courses:
par = course_total_par(c.course_id)
yards = course_total_yardage(c.course_id)
print(f'{c.name:30s} Par {par} | {yards:,} yards | Slope {c.slope_rating} | CR {c.course_rating}')RoundAnalyzer: computing scoring breakdowns
Let’s build a class that takes a round and its associated shot data and computes useful statistics. This is a good example of a class that is not a pure data container – it exists to provide behavior.
class RoundAnalyzer:
"""Analyzes a single round of golf."""
SCORING_NAMES = {
-3: 'albatross', -2: 'eagle', -1: 'birdie',
0: 'par', 1: 'bogey', 2: 'double bogey', 3: 'triple bogey',
}
def __init__(self, round_obj, round_shots, course, holes_for_course):
self.round = round_obj
self.shots = round_shots
self.course = course
self.holes = {h.hole_number: h for h in holes_for_course}
def strokes_per_hole(self):
"""Return a dict mapping hole_number -> number of strokes."""
counts = {}
for s in self.shots:
counts[s.hole] = counts.get(s.hole, 0) + 1
return counts
def relative_to_par(self):
"""Return total score relative to course par."""
total_par = sum(h.par for h in self.holes.values())
return self.round.total_score - total_par
def scoring_breakdown(self):
"""Return a dict counting birdies, pars, bogeys, etc."""
breakdown = {}
for hole_num, strokes in self.strokes_per_hole().items():
par = self.holes[hole_num].par
diff = strokes - par
label = self.SCORING_NAMES.get(diff, f'+{diff}' if diff > 0 else str(diff))
breakdown[label] = breakdown.get(label, 0) + 1
return breakdown
def total_strokes_gained(self):
"""Return the total strokes gained for the round."""
return round(sum(s.strokes_gained for s in self.shots), 2)
def print_scorecard(self, player):
"""Print a formatted scorecard."""
rtp = self.relative_to_par()
rtp_str = f'+{rtp}' if rtp > 0 else ('E' if rtp == 0 else str(rtp))
print(f'=== {player.name} at {self.course.name} ({self.round.date}) ===')
print(f'Score: {self.round.total_score} ({rtp_str}) | Weather: {self.round.weather}')
print(f'Total Strokes Gained: {self.total_strokes_gained()}')
print()
breakdown = self.scoring_breakdown()
print('Scoring breakdown:')
order = ['eagle', 'birdie', 'par', 'bogey', 'double bogey', 'triple bogey']
for label in order:
if label in breakdown:
print(f' {label:>15s}: {breakdown[label]}')
# Print any labels not in the standard order
for label, count in breakdown.items():
if label not in order:
print(f' {label:>15s}: {count}')
print('RoundAnalyzer class defined.')# Group shots by round_id
shots_by_round = {}
for s in shots:
if s.round_id not in shots_by_round:
shots_by_round[s.round_id] = []
shots_by_round[s.round_id].append(s)
# Analyze Bear Woods' first round (round_id=1, North Park)
r = rounds[0] # round_id=1
player = player_lookup[r.player_id]
course = course_lookup[r.course_id]
analyzer = RoundAnalyzer(
round_obj=r,
round_shots=shots_by_round[r.round_id],
course=course,
holes_for_course=holes_by_course[r.course_id],
)
analyzer.print_scorecard(player)# Print a summary for every round
print(f'{"Player":20s} {"Course":30s} {"Date":12s} {"Score":>5s} {"vs Par":>7s} {"SG":>7s}')
print('-' * 85)
for r in rounds:
player = player_lookup[r.player_id]
course = course_lookup[r.course_id]
analyzer = RoundAnalyzer(r, shots_by_round[r.round_id], course, holes_by_course[r.course_id])
rtp = analyzer.relative_to_par()
rtp_str = f'+{rtp}' if rtp > 0 else ('E' if rtp == 0 else str(rtp))
sg = analyzer.total_strokes_gained()
print(f'{player.name:20s} {course.name:30s} {r.date:12s} {r.total_score:>5d} {rtp_str:>7s} {sg:>+7.2f}')Handicap differentials per player
Now we can use Player.handicap_differential() with real data to show each player’s differentials across their rounds.
for p in players:
player_rounds = [r for r in rounds if r.player_id == p.player_id]
diffs = []
for r in player_rounds:
c = course_lookup[r.course_id]
diff = p.handicap_differential(r.total_score, c.course_rating, c.slope_rating)
diffs.append(diff)
avg_diff = sum(diffs) / len(diffs)
print(f'{p.name:20s} (handicap {p.format_handicap()})')
print(f' Differentials: {diffs}')
print(f' Average: {avg_diff:.1f}')
print()7. Inheritance (Brief)
Inheritance lets one class extend another. The child class gets all the parent’s attributes and methods, and can add or override them.
In data science work, inheritance is less common than in traditional software engineering. Dataclasses and composition (building objects that contain other objects) usually cover what you need. But it is worth knowing the concept.
@dataclass
class GolfRecord:
"""Base class for any record that has an ID."""
record_id: int
def describe(self):
return f'{self.__class__.__name__} #{self.record_id}'
@dataclass
class TournamentRound(GolfRecord):
"""A round played in a tournament -- extends GolfRecord."""
player_name: str
course_name: str
score: int
tournament_name: str = 'Pittsburgh Open'
tr = TournamentRound(
record_id=1,
player_name='Bear Woods',
course_name='North Park Golf Course',
score=85,
)
print(tr)
print(tr.describe()) # inherited from GolfRecord
print(f'Is TournamentRound a GolfRecord? {isinstance(tr, GolfRecord)}')Inheritance creates an “is-a” relationship: a TournamentRound is a GolfRecord. This can be useful for shared behavior, but it also creates tight coupling between classes.
For data science work, prefer: - Dataclasses for data containers (Player, Course, Round, etc.) - Composition for complex behavior (RoundAnalyzer has a Round, not is a Round) - Inheritance only when you have a genuine “is-a” relationship with shared behavior
You will rarely need deep inheritance hierarchies. If you find yourself building class trees more than two levels deep, step back and consider whether a simpler approach would work.
8. Connecting to the iPhone App Model
Our data directory also contains shot-tag-round.json, which is a real export from an iPhone shot-tracking app. The JSON structure is nested: a round contains shots, each shot has a club object and GPS coordinates.
Without a data model, you end up writing code like:
data['shots'][0]['club']['name'] # what even is this?Let’s define dataclasses that mirror the JSON structure and load the data into typed objects.
import json
# First, let's see what raw JSON access looks like
with open(DATA_DIR / 'shot-tag-round.json', 'r') as f:
raw_data = json.load(f)
print('Top-level keys:', list(raw_data.keys()))
print(f'Course: {raw_data["courseName"]}')
print(f'Number of shots: {len(raw_data["shots"])}')
print()
# Accessing nested data with raw dicts is fragile and hard to read
first_shot = raw_data['shots'][0]
print('First shot (raw dict):')
print(f' Club: {first_shot["club"]["name"]}')
print(f' Lat: {first_shot["coordinate"]["latitude"]}')
print(f' Lon: {first_shot["coordinate"]["longitude"]}')@dataclass
class Coordinate:
"""A GPS coordinate."""
latitude: float
longitude: float
@classmethod
def from_dict(cls, data):
return cls(latitude=data['latitude'], longitude=data['longitude'])
@dataclass
class ClubInfo:
"""Club details from the shot-tracking app."""
code: str
family: str
club_id: int
name: str
@classmethod
def from_dict(cls, data):
return cls(
code=data['code'],
family=data['family'],
club_id=data['id'],
name=data['name'],
)
@dataclass
class ShotTagShot:
"""A single shot from the iPhone app."""
shot_id: str
club: ClubInfo
coordinate: Coordinate
course_name: str
horizontal_accuracy: float
timestamp: float
@classmethod
def from_dict(cls, data):
return cls(
shot_id=data['id'],
club=ClubInfo.from_dict(data['club']),
coordinate=Coordinate.from_dict(data['coordinate']),
course_name=data['courseName'],
horizontal_accuracy=data['horizontalAccuracy'],
timestamp=data['timestamp'],
)
@dataclass
class ShotTagRound:
"""A round exported from the iPhone shot-tracking app."""
round_id: str
course_name: str
start_date: float
end_date: float
hole_boundaries: list
hole_pin_locations: list # list of Coordinate
shots: list # list of ShotTagShot
@classmethod
def from_dict(cls, data):
return cls(
round_id=data['id'],
course_name=data['courseName'],
start_date=data['startDate'],
end_date=data['endDate'],
hole_boundaries=data['holeBoundaries'],
hole_pin_locations=[Coordinate.from_dict(loc) for loc in data['holePinLocations']],
shots=[ShotTagShot.from_dict(s) for s in data['shots']],
)
print('ShotTagRound, ShotTagShot, ClubInfo, and Coordinate dataclasses defined.')# Load the JSON into our typed model
with open(DATA_DIR / 'shot-tag-round.json', 'r') as f:
raw_data = json.load(f)
app_round = ShotTagRound.from_dict(raw_data)
print(f'Course: {app_round.course_name}')
print(f'Holes tracked: {len(app_round.hole_pin_locations)}')
print(f'Total shots: {len(app_round.shots)}')
print(f'Hole boundaries: {app_round.hole_boundaries}')
print()
# Now accessing nested data is clean and readable
first_shot = app_round.shots[0]
print(f'First shot:')
print(f' Club: {first_shot.club.name} ({first_shot.club.code})')
print(f' Family: {first_shot.club.family}')
print(f' Location: ({first_shot.coordinate.latitude:.6f}, {first_shot.coordinate.longitude:.6f})')
print(f' Accuracy: {first_shot.horizontal_accuracy:.1f} meters')# Use the hole boundaries to group shots by hole
boundaries = app_round.hole_boundaries
print(f'Pin locations and shots per hole:')
print(f'{"Hole":>4s} {"Shots":>5s} {"Pin Lat":>12s} {"Pin Lon":>12s} Clubs Used')
print('-' * 75)
for i in range(len(boundaries) - 1):
hole_num = i + 1
start_idx = boundaries[i]
end_idx = boundaries[i + 1]
hole_shots = app_round.shots[start_idx:end_idx]
pin = app_round.hole_pin_locations[i]
clubs = [s.club.code for s in hole_shots]
print(f'{hole_num:>4d} {len(hole_shots):>5d} {pin.latitude:>12.6f} {pin.longitude:>12.6f} {" -> ".join(clubs)}')
# Handle the last hole (from last boundary to end of shots)
last_hole = len(boundaries)
last_shots = app_round.shots[boundaries[-1]:]
last_pin = app_round.hole_pin_locations[-1]
last_clubs = [s.club.code for s in last_shots]
print(f'{last_hole:>4d} {len(last_shots):>5d} {last_pin.latitude:>12.6f} {last_pin.longitude:>12.6f} {" -> ".join(last_clubs)}')Compare the typed access above to what the raw dict version would look like:
# Raw dict -- hard to read, no autocomplete, no type safety
raw_data['shots'][0]['club']['name']
raw_data['holePinLocations'][0]['latitude']
# Typed dataclass -- clear, discoverable, self-documenting
app_round.shots[0].club.name
app_round.hole_pin_locations[0].latitudeThe dataclass version is not just cleaner – it also catches typos at definition time rather than at runtime.
AI
Exercise 1: Ask AI to Design a Golf Domain Model
Give an AI assistant the following description and ask it to design a data model:
Prompt to use:
I have a golf tracking application. Players play rounds at courses. Each course has 18 holes with different pars and yardages. Each round records the date, weather, and total score. Within each round, every shot is tracked with the club used, where the ball started (lie and distance to pin), where it ended, and a strokes gained value. Design a Python data model using dataclasses to represent this domain.
Evaluate the AI’s response:
- Did it identify the right entities? (Player, Course, Hole, Round, Shot)
- Did it use appropriate types? (int for IDs, float for distances, str for names)
- Did it over-engineer? Common over-engineering: adding an
Enumfor weather or lie types, creating abstract base classes, adding methods you did not ask for, or building an ORM-style relationship system. - Did it miss anything? For example, did it include slope rating and course rating on the Course, or just par?
- How does it compare to the model we built above?
# Paste the AI-generated data model here and compare it to ours.
# Note what the AI got right, what it missed, and what it over-engineered.Exercise 2: Ask AI to Calculate Handicap Index
Give the AI our Player dataclass and ask it to add a method that calculates a handicap index from a list of rounds.
Prompt to use:
Here is my Player dataclass:
@dataclass class Player: player_id: int name: str handicap: float def handicap_differential(self, score, course_rating, slope_rating): return round((score - course_rating) * 113 / slope_rating, 1)Add a method
calculate_handicap_indexthat takes a list of (score, course_rating, slope_rating) tuples representing recent rounds and returns the USGA handicap index. The formula is: take the best 8 differentials out of the most recent 20, average them, and multiply by 0.96.
Evaluate the AI’s response:
- Does it correctly take the best (lowest) 8 differentials?
- Does it handle the case where the player has fewer than 20 rounds? (The USGA has a table for this, but a reasonable simplification is to take the best N/2 differentials if fewer than 20 rounds are available.)
- Does it multiply by 0.96?
- Does it reuse the existing
handicap_differentialmethod, or rewrite the formula?
# Paste the AI-generated method here.
# Test it with real data from our rounds to see if the output makes sense.Exercise 3: Ask AI to Convert Dataclasses to/from JSON
Prompt to use:
I have these Python dataclasses for golf data:
@dataclass class Player: player_id: int name: str handicap: float @dataclass class Round: round_id: int player_id: int course_id: int date: str total_score: int weather: strWrite functions to serialize a list of these objects to JSON and deserialize them back. Handle the Round’s date field as a proper date object (not just a string).
Evaluate the AI’s response:
- Does it handle serialization? (
json.dumpscannot serialize dataclass instances by default – the AI needs to provide a solution likedataclasses.asdict()or a custom encoder.) - Does it handle deserialization? (JSON gives back plain dicts – the AI needs to reconstruct dataclass instances.)
- Does it handle the date field? (Converting between
stranddatetime.daterequiresdatetime.strptimeordate.fromisoformat().) - Does it handle nested objects? (If you include a Player reference inside Round, can the AI serialize/deserialize that relationship?)
- Is the solution overly complex? (Using
dataclasses.asdict()is much simpler than writing a custom JSON encoder from scratch.)
# Paste the AI-generated serialization code here.
# Test round-tripping: serialize to JSON, deserialize back,
# and verify the result matches the original.Summary
When to Use Dicts vs Classes vs Dataclasses
| Approach | Best For | Tradeoffs |
|---|---|---|
| Plain dict | Quick scripts, throwaway analysis, exploring data for the first time | No type safety, no discoverability, no validation, no behavior |
| Regular class | Complex initialization logic, internal state management, custom __init__ behavior |
More boilerplate, full control |
| Dataclass | Data containers, domain models, records, anything where the primary purpose is carrying data with proper types | Minimal boilerplate, auto-generated __init__/__repr__/__eq__, can still add methods |
| Frozen dataclass | Immutable records, lookup data, configuration that should not change | Same as dataclass but prevents accidental mutation |
The progression in this course: - Topics 01-04: Plain dicts from csv.DictReader (fine for learning loops, comprehensions, etc.) - Topic 05 (this one): Dataclasses to give structure and types to our data - Topics 06+: Pandas DataFrames for tabular analysis (a different kind of structure, optimized for columnar operations)
Key Syntax Reference
Regular class:
class Player:
def __init__(self, player_id, name, handicap):
self.player_id = player_id
self.name = name
self.handicap = handicap
def __repr__(self):
return f'Player({self.player_id}, {self.name!r}, {self.handicap})'Dataclass:
from dataclasses import dataclass
@dataclass
class Player:
player_id: int
name: str
handicap: floatFrozen dataclass (immutable):
@dataclass(frozen=True)
class CourseInfo:
name: str
slope_rating: int
course_rating: floatFactory classmethod:
@classmethod
def from_csv_row(cls, row):
return cls(
player_id=int(row['player_id']),
name=row['name'],
handicap=float(row['handicap']),
)Generic CSV loader:
def load_csv(filepath, cls):
with open(filepath, 'r') as f:
return [cls.from_csv_row(row) for row in csv.DictReader(f)]Next up: Topic 06 – Pandas and Exploratory Data Analysis.